Computer vision algorithms include methods for acquiring, processing, analyzing and understanding digital images, and extraction of data from the real world. It is an interdisciplinary field that deals with how can computers gain a high-level understanding of digital images. It aims to mimic human vision.
Convolutional neural networks are now capable of outperforming humans on some computer vision tasks,
such as classifying images.
In this project, I provide a solution to the Landmark Recognition Problem. Given an input photo of a place anywhere around the world, the computer can recognize and label the landmark in which this image was taken.
The dataset I used for this project is Kaggle Google Landmark Recognition 2019 dataset and can be
downloaded from Common Visual Data Foundation 1 Google Landmarks Dataset v2.
import numpy as np
import pandas as pd
from IPython.display import display # Allows the use of display() for DataFrames
import matplotlib.pyplot as plt
# Pretty display for notebooks
%matplotlib inline
import sys, os
from os import path
import csv
import pickle #object binary serialization
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' #suppress tensorflow warnings
from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.utils.np_utils import to_categorical
from keras.utils import plot_model
from keras import applications, optimizers
from keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img
from google.colab import drive
drive.mount('/gdrive')
cd /
%%shell
cp -v /gdrive/My\ Drive/shared/Udacity_MLND_capstone_dataset/data.tar.gz /
tar xvzf data.tar.gz
mkdir models
mkdir -p docs/figures
mkdir -p docs/stats
# Data paths definitions :
data_dir = "../data"
input_csv_dir = path.join(data_dir,"input_csv") #csv files that were dowlnoaded from kagggle
train_dir = path.join(data_dir, "train") ##Training images directory
validation_dir = path.join(data_dir, "validation") ##Validation images directory
test_dir = path.join(data_dir, "test") ##Validation images directory
test_images_dir = path.join(data_dir, "test_images") #test images directory (unlabeled images used for prediction)
models_dir = "../models"
docs_dir = "../docs"
stats_dir = path.join(docs_dir, "stats") #Output csv statistics directory
figures_dir = path.join(docs_dir, "figures") #Output directory for saving figures
Did you ever go through your vacation photos and ask yourself: What is the name of this temple I visited in China? Who created this monument I saw in France? Landmark recognition can help! This technology can predict landmark labels directly from image pixels, to help people better understand and organize their photo collections.
This problem was inspired by Google Landmark Recognition 2019 Challenge on Kaggle.
Landmark recognition is a little different from other classification problems. It contains a much larger number of classes (there are a total of 15K classes in this challenge), and the number of training examples per class may not be very large. Landmark recognition is challenging in its way.
This problem is a multi-class classification problem. In this problem, I built a classifier that can be trained using the given dataset and can be used to predict the landmark class from a given input image.
I have chosen to use convolutional neural networks and transfer learning techniques as classifiers. CNNs yield better results than traditional computer vision algorithms. I trained a basic convolutional neural network, then used pre-trained VGG16 and Xception models in transfer learning to solve the Google Landmark Recognition 2019 Problem.
This metric is also known as micro Average Precision (microAP), as per F. Perronnin, Y. Liu, and J.-M. Renders, "A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval," Proc. CVPR'09
where:
Script Source : Kaggle - David Thaler - Gap Metric
# Script Source : [Kaggle - David Thaler - Gap Metric](https://www.kaggle.com/davidthaler/gap-metric)
def GAP_vector(pred, conf, true, return_x=False):
'''
Compute Global Average Precision (aka micro AP), the metric for the
Google Landmark Recognition competition.
This function takes predictions, labels and confidence scores as vectors.
In both predictions and ground-truth, use None/np.nan for "no label".
Args:
pred: vector of integer-coded predictions
conf: vector of probability or confidence scores for pred
true: vector of integer-coded labels for ground truth
return_x: also return the data frame used in the calculation
Returns:
GAP score
'''
x = pd.DataFrame({'pred': pred, 'conf': conf, 'true': true})
x.sort_values('conf', ascending=False, inplace=True, na_position='last')
x['correct'] = (x.true == x.pred).astype(int)
x['prec_k'] = x.correct.cumsum() / (np.arange(len(x)) + 1)
x['term'] = x.prec_k * x.correct
gap = x.term.sum() / x.true.count()
if return_x:
return gap, x
else:
return gap
Here We define a class that records performance metrics to csv file and initialize an instance of it:
class StatsCSV:
def __init__(self, csv_file):
self.csv_file = csv_file
with open(self.csv_file, 'w', newline='') as csvfile:
header_writer = csv.writer(csvfile)
header_writer.writerow(['Model', 'Test Loss', 'Test Accuracy', 'Test GAP'])
def add_stats(self,model_name, loss, accuracy, GAP):
with open(self.csv_file, 'a', newline='') as csvfile:
stats_writer = csv.writer(csvfile)
stats_writer.writerow([model_name, round(loss,4), round(accuracy * 100, 2), round(GAP * 100, 2)])
stats_csv = StatsCSV(path.join(stats_dir, "stats.csv"))
class IndexCSV:
def __init__(self, name, index_csv_dir, index_csv_file):
try:
self.index_data = pd.read_csv(path.join(index_csv_dir, index_csv_file), usecols=['id', 'category'])
print("File has {} samples with {} features each.".format(*self.index_data.shape))
except:
print("File could not be loaded. Is the dataset missing?")
self.name = name
self.fig = None
self.ax = None
self.freqs = None
def get_freqs(self):
self.freqs = self.index_data['category'].value_counts().to_frame()
self.freqs.columns = ['images_count']
return self.freqs
def get_plot(self):
#self.fig, self.ax = plt.subplots()
self.ax = self.index_data['category'].value_counts().plot.barh(title=self.name)
self.ax.set(xlabel='Images Count', ylabel='Landmark (Class) Name')
self.ax.grid(True)
return self.ax
Load index csv files :
index_train_csv = IndexCSV("Training Data", data_dir, "index_train.csv")
index_validation_csv = IndexCSV("Validation Data", data_dir, "index_validation.csv")
index_test_csv = IndexCSV("Test Data", data_dir, "index_test.csv")
landmarks_list = index_validation_csv.index_data.groupby('category').nunique().index.to_list()
landmarks_df = pd.DataFrame(landmarks_list , index=np.arange(1, len(landmarks_list) + 1), columns = ['Landmarks'])
display(landmarks_df)
landmarks_df.to_csv(os.path.join(stats_dir, "selected_landmarks.csv"), index=True) #Save to CSV
train_freqs = index_train_csv.get_freqs()
display(train_freqs)
train_freqs.to_csv(os.path.join(stats_dir, "train_freqs.csv"), index=True)
validation_freqs = index_validation_csv.get_freqs()
display(validation_freqs)
validation_freqs.to_csv(os.path.join(stats_dir, "validation_freqs.csv"), index=True)
test_freqs = index_test_csv.get_freqs()
display(test_freqs)
test_freqs.to_csv(os.path.join(stats_dir, "test_freqs.csv"), index=True)
plt.figure(figsize=(8, 14))
plt.subplot(311)
index_train_ax = index_train_csv.get_plot()
plt.subplot(312, sharex=index_train_ax)
index_validation_ax = index_validation_csv.get_plot()
plt.subplot(313, sharex=index_train_ax)
index_test_ax = index_test_csv.get_plot()
index_train_ax.xaxis.set_tick_params(which='both', labelleft=True) # Get ticklabels back on shared axis
index_validation_ax.xaxis.set_tick_params(which='both', labelleft=True) # Get ticklabels back on shared axis
plt.savefig(path.join(figures_dir, "dataset_hbar_plot.pdf"), bbox_inches = 'tight')
plt.show()
The code snippet below is our benchmark model, a simple stack of 3 convolution layers with a ReLU activation and followed by max-pooling layers. This is very similar to the architectures that Yann LeCun advocated in the 1990s for image classification (with the exception of ReLU). citation - Handwritten Zip Code Recognition with Multilayer Networks
class BenchmarkDataProcessor:
def __init__(self, input_shape, batch_size, train_dir, vaidation_dir, test_dir):
self.train_datagen = ImageDataGenerator(
rescale = 1.0/255, #rescale pixel values from [0,255] to [0,1]
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
self.test_datagen = ImageDataGenerator(rescale=1.0/255)
self.input_shape = input_shape
self.batch_size = batch_size
self.init_train_generator(train_dir)
self.init_validation_generator(validation_dir)
self.init_test_generator(test_dir)
def __init_generator(self, datagen, images_dir):
return datagen.flow_from_directory(
directory=images_dir , # this is the target directory
target_size=self.input_shape[:2], # all images will be resized to input shape 224x224
color_mode="rgb",
batch_size=self.batch_size,
class_mode='categorical',
shuffle=False)
def init_train_generator(self, train_dir):
self.train_generator = self.__init_generator(self.train_datagen, train_dir)
def init_validation_generator(self, validation_dir):
self.validation_generator = self.__init_generator(self.test_datagen, validation_dir)
def init_test_generator(self, test_dir):
self.test_generator = self.__init_generator(self.test_datagen, test_dir)
input_shape=(224, 224, 3)
batch_size = 16
benchmark_data_processor = BenchmarkDataProcessor(
input_shape,
batch_size,
train_dir,
validation_dir,
test_dir)
img = load_img(path.join(train_dir, "Kazan/3e222ad7d1469deb.jpg"))
img_array = img_to_array(img)
plt.imshow(img_array/255)
plt.title("Original Image")
plt.savefig(path.join(figures_dir, "augmented_image_original.pdf"), bbox_inches = 'tight')
plt.show()
#----------------------------
columns = 4
rows = 5
fig = plt.figure(figsize=(20,20))
for i, batch in enumerate(benchmark_data_processor.train_datagen.flow(img_array.reshape((1,) + img_array.shape), batch_size=16)):
if i > rows * columns - 1:
break
fig.add_subplot(rows, columns, i+1)
plt.imshow(batch[0])
plt.savefig(path.join(figures_dir, "augmented_image_transformations.pdf"), bbox_inches = 'tight')
plt.show()
# My benchmark model - a simple CNN
benchmark_model = Sequential()
benchmark_model.add(Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=input_shape))
benchmark_model.add(MaxPooling2D(pool_size=(2, 2)))
benchmark_model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
benchmark_model.add(MaxPooling2D(pool_size=(2, 2)))
benchmark_model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
benchmark_model.add(MaxPooling2D(pool_size=(2, 2)))
benchmark_model.add(GlobalAveragePooling2D())
#benchmark_model.add(Flatten())
benchmark_model.add(Dense(64, activation='relu'))
benchmark_model.add(Dropout(0.5))
benchmark_model.add(Dense(10, activation='softmax'))
benchmark_model.summary()
benchmark_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Save Model Architecture to file
plot_model(benchmark_model,
to_file=path.join(figures_dir, "benchmark_model_architecture.pdf"),
show_shapes=True,
show_layer_names=False
)
benchmark_weights_file = path.join(models_dir, "benchmark-model_weights.hdf5")
benchmark_history_file = path.join(models_dir, "benchmark-model_history.pickle")
#Checkpointer to save model best weights
benchmark_checkpointer = ModelCheckpoint(filepath = benchmark_weights_file,
monitor='val_acc',
verbose=1,
save_best_only=True)
#Early Stopping
early_stopping = EarlyStopping(monitor='val_acc',
verbose=1,
patience=10)
epochs = 50
steps_per_epoch = benchmark_data_processor.train_generator.samples // benchmark_data_processor.batch_size
validation_steps = benchmark_data_processor.validation_generator.samples // benchmark_data_processor.batch_size
#epochs = 5
#steps_per_epoch = 15
#validation_steps = 4
benchmark_model_history = benchmark_model.fit_generator(
benchmark_data_processor.train_generator,
steps_per_epoch=steps_per_epoch,
epochs=epochs,
callbacks = [benchmark_checkpointer, early_stopping],
validation_data=benchmark_data_processor.validation_generator,
validation_steps=validation_steps,
verbose=1)
# Save Model History
with open(benchmark_history_file, 'wb') as pickle_file:
pickle.dump(benchmark_model_history, pickle_file)
# Load Model History
with open(benchmark_history_file, 'rb') as pickle_file:
benchmark_model_history = pickle.load(pickle_file)
# Loading Best Weights
benchmark_model.load_weights(benchmark_weights_file)
def plot_learning_curves(model_history, model_name, plot_filename):
fig, ax = plt.subplots(2, 1)
fig.set_size_inches(8, 12)
#fig.suptitle(model_name + ' Performance Metrics')
ax[0].plot(model_history.history['acc'])
ax[0].plot(model_history.history['val_acc'])
ax[0].set_title(model_name + ' Performance Metrics\n' + 'Accuracy')
ax[0].legend(['Training', 'Validation'], loc='upper left')
ax[0].set_xlabel('Epoch')
ax[0].set_ylabel('Accuracy')
ax[0].grid(True)
ax[1].plot(model_history.history['loss'])
ax[1].plot(model_history.history['val_loss'])
ax[1].set_title('Loss')
ax[1].legend(['Training', 'Validation'], loc='upper right')
ax[1].set_xlabel('Epoch')
ax[1].set_ylabel('Loss')
ax[1].grid(True)
plt.savefig(plot_filename, bbox_inches = 'tight')
plt.show()
plot_learning_curves(benchmark_model_history, 'Benchmark Model', path.join(figures_dir, "benchmark_model_metrics.pdf"))
Accuracy :
[benchmark_test_loss, benchmark_test_accuracy] = benchmark_model.evaluate_generator(benchmark_data_processor.test_generator, benchmark_data_processor.test_generator.samples, workers=2, use_multiprocessing=True, verbose=1)
print(benchmark_model.metrics_names)
print([benchmark_test_loss, benchmark_test_accuracy])
GAP :
def get_GAP(model, test_generator):
test_generator.reset()
samples_count = test_generator.samples
images = np.empty((0,) + input_shape)
max_batch_index = len(test_generator)
i = 0
for batch in test_generator:
images = np.append(images, batch[0], axis=0) #image data
# print(batch[1][i]) #image labels
i += 1
if i > max_batch_index - 1:
break
probabilities = model.predict(images)
predicted_classes = model.predict_classes(images)
confidence_scores = probabilities[(np.arange(samples_count), predicted_classes)]
true_labels = test_generator.labels
return GAP_vector(predicted_classes, confidence_scores, true_labels)
benchmark_test_GAP = get_GAP(benchmark_model, benchmark_data_processor.test_generator)
print(benchmark_test_GAP)
Saving results :
stats_csv.add_stats("Benchmark Model", benchmark_test_loss, benchmark_test_accuracy, benchmark_test_GAP)
class DataProcessor:
def __init__(self, input_shape, batch_size, train_dir, validation_dir, test_dir):
self.train_datagen = ImageDataGenerator(rescale = 1.0/255) #rescale pixel values from [0,255] to [0,1]
self.test_datagen = ImageDataGenerator(rescale=1.0/255)
self.input_shape = input_shape
self.batch_size = batch_size
self.init_train_generator(train_dir)
self.init_validation_generator(validation_dir)
self.init_test_generator(test_dir)
def __init_generator(self, datagen, images_dir):
return datagen.flow_from_directory(
directory=images_dir , # this is the target directory
target_size=self.input_shape[:2], # all images will be resized to input shape 224x224
batch_size=self.batch_size,
class_mode=None,
shuffle=False)
def init_train_generator(self, train_dir):
self.train_generator = self.__init_generator(self.train_datagen, train_dir)
def init_validation_generator(self, validation_dir):
self.validation_generator = self.__init_generator(self.test_datagen, validation_dir)
def init_test_generator(self, test_dir):
self.test_generator = self.__init_generator(self.test_datagen, test_dir)
input_shape=(224, 224, 3)
batch_size = 1
data_processor = DataProcessor(
input_shape,
batch_size,
train_dir,
validation_dir,
test_dir)
class PretrainedModel:
def __init__(self, model_name, pretrained_model, data_processor, save_path ):
self.model_name = model_name
self.pretrained_model = pretrained_model
self.data_processor = data_processor
self.save_path = save_path
def get_file_path(self, file_name):
return path.join(self.save_path, self.model_name + '_' + file_name)
def predict_bottleneck_features(self):
max_queue_size = 10 #defult is 10
workers = 2 # Default is 1
self.bottleneck_features_train = self.pretrained_model.predict_generator(
self.data_processor.train_generator,
steps=self.data_processor.train_generator.samples // self.data_processor.batch_size,
max_queue_size=max_queue_size,
workers=workers,
use_multiprocessing=True, #default is False
verbose=1)
self.bottleneck_features_validation = self.pretrained_model.predict_generator(
self.data_processor.validation_generator,
steps=self.data_processor.validation_generator.samples // self.data_processor.batch_size,
max_queue_size=max_queue_size,
workers=workers,
use_multiprocessing=True,
verbose=1)
self.bottleneck_features_test = self.pretrained_model.predict_generator(
self.data_processor.test_generator,
steps=self.data_processor.test_generator.samples // self.data_processor.batch_size,
max_queue_size=max_queue_size,
workers=workers,
use_multiprocessing=True,
verbose=1)
self.save_bottleneck_features()
def save_bottleneck_features(self):
np.save(open(self.get_file_path('bottleneck_features_train.npz'), 'wb'),
self.bottleneck_features_train)
np.save(open(self.get_file_path('bottleneck_features_validation.npz'), 'wb'),
self.bottleneck_features_validation)
np.save(open(self.get_file_path('bottleneck_features_test.npz'), 'wb'),
self.bottleneck_features_test)
def load_bottleneck_features(self):
self.bottleneck_features_train = np.load(open(self.get_file_path('bottleneck_features_train.npz'), 'rb'))
self.bottleneck_features_validation = np.load(open(self.get_file_path('bottleneck_features_validation.npz'), 'rb'))
self.bottleneck_features_test = np.load(open(self.get_file_path('bottleneck_features_test.npz'), 'rb'))
def create_top_model(self, optimizer):
self.top_model = Sequential()
self.top_model.add(Dense(256, activation='relu', input_shape=self.bottleneck_features_train.shape[1:])) #[1:]
self.top_model.add(Dense(128, activation='relu'))
self.top_model.add(Dropout(0.3))
self.top_model.add(Dense(10, activation='softmax'))
self.top_model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics=['accuracy'])
self.top_model.summary()
def save_top_model_graph(self, figures_path): # Save Top Model Architecture to file
plot_model(self.top_model,
to_file=path.join(figures_path, self.model_name +"_top-model_architecture.pdf"),
show_shapes=True,
show_layer_names=False
)
def train_top_model(self, epochs, batch_size):
early_stopping = EarlyStopping(monitor='val_acc', verbose=1, patience=100)
checkpointer = ModelCheckpoint(
filepath=self.get_file_path('top-model_weights.hdf5'),
monitor='val_acc',
verbose=0,
save_best_only=True)
train_labels = to_categorical(self.data_processor.train_generator.classes)
validation_labels = to_categorical(self.data_processor.validation_generator.classes)
self.history = self.top_model.fit(self.bottleneck_features_train,
train_labels,
epochs=epochs,
batch_size=batch_size,
validation_data=(self.bottleneck_features_validation, validation_labels),
callbacks=[checkpointer, early_stopping],
verbose=1)
def save_top_model_history(self):
with open(self.get_file_path('top-model_history.pickle'), 'wb') as pickle_file:
pickle.dump(self.history, pickle_file)
def load_top_model_history(self):
with open(self.get_file_path('top-model_history.pickle'), 'rb') as pickle_file:
self.history = pickle.load(pickle_file)
def load_top_model_weights(self):
self.top_model.load_weights(self.get_file_path('top-model_weights.hdf5'))
def test_top_model(self):
self.load_top_model_weights()
test_labels = to_categorical(self.data_processor.test_generator.classes)
stats = self.top_model.evaluate(self.bottleneck_features_test, test_labels, workers=2, use_multiprocessing=True, verbose=1)
print(stats)
return stats
def get_GAP(self):
samples_count = self.data_processor.test_generator.samples
probabilities = self.top_model.predict(self.bottleneck_features_test)
predicted_classes = self.top_model.predict_classes(self.bottleneck_features_test)
confidence_scores = probabilities[(np.arange(samples_count), predicted_classes)]
true_labels = self.data_processor.test_generator.labels
GAP = GAP_vector(predicted_classes, confidence_scores, true_labels)
print(GAP)
return GAP
def plot_learning_curves(self, save_path):
fig, ax = plt.subplots(1, 2)
fig.set_size_inches(16,4)
fig.suptitle(self.model_name + ' Performance Metrics')
ax[0].plot(self.history.history['acc'])
ax[0].plot(self.history.history['val_acc'])
ax[0].set_title('Accuracy')
ax[0].legend(['Training', 'Validation'], loc='upper left')
ax[0].set_xlabel('Epoch')
ax[0].set_ylabel('Accuracy')
ax[0].grid(True)
ax[1].plot(self.history.history['loss'])
ax[1].plot(self.history.history['val_loss'])
ax[1].set_title('Loss')
ax[1].legend(['Training', 'Validation'], loc='upper right')
ax[1].set_xlabel('Epoch')
ax[1].set_ylabel('Loss')
ax[1].grid(True)
plt.savefig(path.join(save_path, self.model_name +"_metrics.pdf"), bbox_inches = 'tight')
plt.show()
VGG16_model = PretrainedModel("VGG16_model",
applications.VGG16(include_top=False, weights='imagenet',pooling='avg'),
data_processor,
models_dir)
VGG16_model.predict_bottleneck_features()
VGG16_model.load_bottleneck_features()
VGG16_model.create_top_model(optimizers.SGD(lr=0.01, clipnorm=1.,momentum=0.7))
# Save top-model architecture to file
VGG16_model.save_top_model_graph(figures_dir)
VGG16_model.train_top_model(epochs=1000, batch_size=4096)
VGG16_model.save_top_model_history()
VGG16_model.load_top_model_history()
VGG16_model.load_top_model_weights()
[test_loss, test_accuracy] = VGG16_model.test_top_model()
test_GAP = VGG16_model.get_GAP()
stats_csv.add_stats("VGG16 Model", test_loss, test_accuracy, test_GAP)
VGG16_model.plot_learning_curves(figures_dir)
Xception_model = PretrainedModel("Xception_model",
applications.Xception(include_top=False, weights='imagenet',pooling='avg'),
data_processor,
models_dir)
Xception_model.predict_bottleneck_features()
Xception_model.load_bottleneck_features()
Xception_model.create_top_model(optimizers.SGD(lr=0.01, clipnorm=1.,momentum=0.7))
# Save top-model architecture to file
Xception_model.save_top_model_graph(figures_dir)
Xception_model.train_top_model(epochs=1000, batch_size=4096)
Xception_model.save_top_model_history()
Xception_model.load_top_model_history()
Xception_model.load_top_model_weights()
[test_loss, test_accuracy] = Xception_model.test_top_model()
test_GAP = Xception_model.get_GAP()
stats_csv.add_stats("Xception Model", test_loss, test_accuracy, test_GAP)
Xception_model.plot_learning_curves(figures_dir)
VGG16_model_refinement = PretrainedModel("VGG16_model_refinement",
applications.VGG16(include_top=False, weights='imagenet',pooling='avg'),
data_processor,
models_dir)
VGG16_model_refinement.predict_bottleneck_features()
VGG16_model_refinement.load_bottleneck_features()
VGG16_model_refinement.create_top_model(optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0))
VGG16_model_refinement.train_top_model(epochs=1000, batch_size=4096)
VGG16_model_refinement.save_top_model_history()
VGG16_model_refinement.load_top_model_history()
VGG16_model_refinement.load_top_model_weights()
[test_loss, test_accuracy] = VGG16_model_refinement.test_top_model()
test_GAP = VGG16_model_refinement.get_GAP()
stats_csv.add_stats("VGG16 Model(Refined)", test_loss, test_accuracy, test_GAP)
VGG16_model_refinement.plot_learning_curves(figures_dir)
Xception_model_refinement = PretrainedModel("Xception_model_refinement",
applications.Xception(include_top=False, weights='imagenet',pooling='avg'),
data_processor,
models_dir)
Xception_model_refinement.predict_bottleneck_features()
Xception_model_refinement.load_bottleneck_features()
Xception_model_refinement.create_top_model(optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0))
Xception_model_refinement.train_top_model(epochs=1000, batch_size=4096)
Xception_model_refinement.save_top_model_history()
Xception_model_refinement.load_top_model_history()
Xception_model_refinement.load_top_model_weights()
[test_loss, test_accuracy] = Xception_model_refinement.test_top_model()
test_GAP = Xception_model_refinement.get_GAP()
stats_csv.add_stats("Xception Model(Refined)", test_loss, test_accuracy, test_GAP)
Xception_model_refinement.plot_learning_curves(figures_dir)
Trying different image input shape 150x150
evaluation_input_shape = (150, 150, 3)
evaluation_data_processor = DataProcessor(
evaluation_input_shape,
batch_size,
train_dir,
validation_dir,
test_dir)
evaluation_model = PretrainedModel("evaluation_model",
applications.Xception(include_top=False, weights='imagenet',pooling='avg'),
evaluation_data_processor,
models_dir)
evaluation_model.predict_bottleneck_features()
evaluation_model.load_bottleneck_features()
evaluation_model.create_top_model(optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0))
evaluation_model.train_top_model(epochs=1000, batch_size=4096)
evaluation_model.save_top_model_history()
#evaluation_model.load_top_model_history()
evaluation_model.load_top_model_weights()
[test_loss, test_accuracy] = evaluation_model.test_top_model()
test_GAP = evaluation_model.get_GAP()
#stats_csv.add_stats("Solution Model Validation", test_loss, test_accuracy, test_GAP)
evaluation_model.plot_learning_curves(figures_dir)
The final solution model has 84.91% test accuracy score and 82.68% GAP score. Its performance is significantly higher than the benchmark model which had 62.29% test accuracy score and 55.85% GAP score.
from tqdm import tqdm
from sklearn.datasets import load_files
def extract_VGG16(VGG16model, tensor):
from keras.applications.vgg16 import VGG16, preprocess_input
return VGG16model.predict(preprocess_input(tensor))
def extract_Xception(Xceptionmodel, tensor):
from keras.applications.xception import Xception, preprocess_input
return Xceptionmodel.predict(preprocess_input(tensor))
Here we assign our best solution model to the variable solution_model and preprocessor function to extract_bottleneck_features variable
solution_model = Xception_model_refinement
extract_bottleneck_features = extract_Xception
def path_to_tensor(img_path):
# loads RGB image as PIL.Image.Image type
img = load_img(img_path, target_size=(224, 224))
# convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = img_to_array(img)
# convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)
def paths_to_tensor(img_paths):
list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors)
# Loading test image paths
test_data = load_files(test_images_dir)
test_images_paths = np.array(test_data['filenames'])
# Getting prediciton labels using solution model :
bottleneck_features = extract_bottleneck_features(solution_model.pretrained_model, paths_to_tensor(test_images_paths))
predicted_classes = solution_model.top_model.predict_classes(bottleneck_features)
predicted_labels = np.array(landmarks_list)[predicted_classes]
# Plot test images and their predicted labels :
columns = len(predicted_labels)
rows = 1
fig = plt.figure(figsize=(20,20))
for i, image_path in enumerate(test_images_paths):
fig.add_subplot(rows, columns, i+1)
plt.imshow(load_img(image_path))
plt.title(predicted_labels[i])
plt.savefig(path.join(figures_dir, "visualization_predicted_images.pdf"), bbox_inches = 'tight')
plt.show()
def get_train_image_path(image_id, label):
return path.join(data_dir, 'train', label, image_id + '.jpg')
similar_train_landmarks_ids =['Kazan','Feroz_Shah_Kotla', 'Golden_Gate_Bridge']
similar_train_images_ids = np.array([
['3e222ad7d1469deb', '33e8a9c8e96f20eb', '1586d22396244714'],
['1e33638e9fb39b0d', '2f7b8d029e1402b4', '1e4569e97ea7ade0'],
['5cd678ac220edb5a', '8c188787c993afba', '8e471206a5892d58'],
])
rows = similar_train_images_ids.shape[0]
columns = similar_train_images_ids.shape[1]
fig = plt.figure(figsize=(20,20))
for i, landmark in enumerate(similar_train_images_ids):
for j, image in enumerate(landmark):
image_path = get_train_image_path(str(image), str(similar_train_landmarks_ids[i]))
fig.add_subplot(rows, columns, columns*i + j + 1)
plt.imshow(load_img(image_path))
plt.title(similar_train_landmarks_ids[i])
plt.savefig(path.join(figures_dir, "visualization_training_samples.pdf"), bbox_inches = 'tight')
plt.show()
The entire end-to-end problem solution can be summarized as the following :
An interesting aspect of this project was that transfer-learning models achieved great performance in short training time. Their performance was better than the benchmark model that took a much longer time to train.
The challenging aspect of the project was the selection of classes and the extraction of a balanced subset dataset for the project from the large publicly available dataset. Another challenge was the implementation of the GAP performance metric and obtaining the correct input vectors.
Convolutional neural networks in general and transfer-learning techniques, in particular, are the best for image classification problems (up to current date) as we have seen in this project implementation. They are highly recommended for such problems and similar problems.
For the improvement of our solution model, my suggestions are the following :
cd /
%%shell
tar cvzf stats.tar.gz docs/
tar cvzf models.tar.gz models/
cp -v stats.tar.gz /gdrive/My\ Drive/shared/Udacity_MLND_capstone_dataset/
cp -v models.tar.gz /gdrive/My\ Drive/shared/Udacity_MLND_capstone_dataset/